假新闻,虚假或误导性信息作为新闻,对社会的许多方面产生了重大影响,例如在政治或医疗域名。由于假新闻的欺骗性,仅将自然语言处理(NLP)技术应用于新闻内容不足。多级社会上下文信息(新闻出版商和社交媒体的参与者)和用户参与的时间信息是假新闻检测中的重要信息。然而,正确使用此信息,介绍了三个慢性困难:1)多级社会上下文信息很难在没有信息丢失的情况下使用,2)难以使用时间信息以及多级社会上下文信息,3 )具有多级社会背景和时间信息的新闻表示难以以端到端的方式学习。为了克服所有三个困难,我们提出了一种新颖的假新闻检测框架,杂扫描。我们使用元路径在不损失的情况下提取有意义的多级社会上下文信息。 COMA-PATO,建议连接两个节点类型的复合关系,以捕获异构图中的语义。然后,我们提出了元路径实例编码和聚合方法,以捕获用户参与的时间信息,并生成新闻代表端到端。根据我们的实验,杂扫不断的性能改善了最先进的假新闻检测方法。
translated by 谷歌翻译
Single-image super-resolution (SISR) networks trained with perceptual and adversarial losses provide high-contrast outputs compared to those of networks trained with distortion-oriented losses, such as L1 or L2. However, it has been shown that using a single perceptual loss is insufficient for accurately restoring locally varying diverse shapes in images, often generating undesirable artifacts or unnatural details. For this reason, combinations of various losses, such as perceptual, adversarial, and distortion losses, have been attempted, yet it remains challenging to find optimal combinations. Hence, in this paper, we propose a new SISR framework that applies optimal objectives for each region to generate plausible results in overall areas of high-resolution outputs. Specifically, the framework comprises two models: a predictive model that infers an optimal objective map for a given low-resolution (LR) input and a generative model that applies a target objective map to produce the corresponding SR output. The generative model is trained over our proposed objective trajectory representing a set of essential objectives, which enables the single network to learn various SR results corresponding to combined losses on the trajectory. The predictive model is trained using pairs of LR images and corresponding optimal objective maps searched from the objective trajectory. Experimental results on five benchmarks show that the proposed method outperforms state-of-the-art perception-driven SR methods in LPIPS, DISTS, PSNR, and SSIM metrics. The visual results also demonstrate the superiority of our method in perception-oriented reconstruction. The code and models are available at https://github.com/seungho-snu/SROOE.
translated by 谷歌翻译
最近的成功表明,可以通过文本提示来操纵图像,例如,在雨天的晴天,在雨天中被操纵到同一场景中,这是由文本输入“下雨”驱动的雨天。这些方法经常利用基于样式的图像生成器,该生成器利用多模式(文本和图像)嵌入空间。但是,我们观察到,这种文本输入通常在提供和综合丰富的语义提示时被瓶颈瓶颈,例如将大雨与雨雨区分开。为了解决这个问题,我们主张利用另一种方式,声音,在图像操纵中具有显着优势,因为它可以传达出比文本更多样化的语义提示(生动的情感或自然世界的动态表达)。在本文中,我们提出了一种新颖的方法,该方法首先使用声音扩展了图像文本接头嵌入空间,并应用了一种直接的潜在优化方法来根据音频输入(例如雨的声音)操纵给定的图像。我们的广泛实验表明,我们的声音引导的图像操纵方法在语义和视觉上比最先进的文本和声音引导的图像操纵方法产生更合理的操作结果,这通过我们的人类评估进一步证实。我们的下游任务评估还表明,我们学到的图像文本单嵌入空间有效地编码声音输入。
translated by 谷歌翻译
Stylegan最近的成功表明,预训练的Stylegan潜在空间对现实的视频生成很有用。但是,由于难以确定stylegan潜在空间的方向和幅度,因此视频中产生的运动通常在语义上没有意义。在本文中,我们提出了一个框架来通过利用多模式(声音图像文本)嵌入空间来生成现实视频。由于声音提供了场景的时间上下文,因此我们的框架学会了生成与声音一致的视频。首先,我们的声音反演模块将音频直接映射到Stylegan潜在空间中。然后,我们结合了基于夹子的多模式嵌入空间,以进一步提供视听关系。最后,提出的帧发电机学会在潜在空间中找到轨迹,该空间与相应的声音相干,并以层次结构方式生成视频。我们为声音引导的视频生成任务提供新的高分辨率景观视频数据集(视听对)。实验表明,我们的模型在视频质量方面优于最新方法。我们进一步显示了几种应用程序,包括图像和视频编辑,以验证我们方法的有效性。
translated by 谷歌翻译
最近的研究通过卷积神经网络(CNNS)显着提高了单图像超分辨率(SR)的性能。虽然可以有许多用于给定输入的高分辨率(HR)解决方案,但大多数现有的基于CNN的方法在推理期间不会探索替代解决方案。获得替代SR结果的典型方法是培训具有不同丢失权重的多个SR模型,并利用这些模型的组合。我们通过利用多任务学习,我们提出了一种更有效的方法来培训单个可调SR模型的单一可调SR模型。具体地,我们在训练期间优化具有条件目标的SR模型,其中目标是不同特征级别的多个感知损失的加权之和。权重根据给定条件而变化,并且该组重量被定义为样式控制器。此外,我们提出了一种适用于该训练方案的架构,该架构是配备有空间特征变换层的残留残余密集块。在推理阶段,我们培训的模型可以在样式控制地图上生成局部不同的输出。广泛的实验表明,所提出的SR模型在没有伪影的情况下产生各种所需的重建,并对最先进的SR方法产生相当的定量性能。
translated by 谷歌翻译
最近的生成模型的成功表明,利用多模态嵌入空间可以使用文本信息操纵图像。然而,由于源的动态特性,使用其他来源而不是声音的文本来操纵图像,而不是声音,并不容易。特别是,声音可以传达真实世界的生动情感和动态表达。在这里,我们提出了一个框架,该框架将声音直接编码为多模态(图像文本)嵌入空间,并从空间操纵图像。我们的音频编码器受过培训以产生来自音频输入的潜在表示,该音频输入被强制与多模式嵌入空间中的图像和文本表示对齐。我们使用基于对齐的嵌入式的直接潜在优化方法进行声音引导图像操纵。我们还表明,我们的方法可以混合文本和音频模态,这丰富了各种图像修改。我们验证了定量和定性的声音引导图像操纵的有效性。我们还表明,我们的方法可以混合不同的模态,即文本和音频,这丰富了图像修改的各种。零射频分类和语义级图像分类的实验表明,我们所提出的模型优于其他文本和声音引导最先进的方法。
translated by 谷歌翻译
Graph Neural Networks (GNNs) have shown satisfying performance on various graph learning tasks. To achieve better fitting capability, most GNNs are with a large number of parameters, which makes these GNNs computationally expensive. Therefore, it is difficult to deploy them onto edge devices with scarce computational resources, e.g., mobile phones and wearable smart devices. Knowledge Distillation (KD) is a common solution to compress GNNs, where a light-weighted model (i.e., the student model) is encouraged to mimic the behavior of a computationally expensive GNN (i.e., the teacher GNN model). Nevertheless, most existing GNN-based KD methods lack fairness consideration. As a consequence, the student model usually inherits and even exaggerates the bias from the teacher GNN. To handle such a problem, we take initial steps towards fair knowledge distillation for GNNs. Specifically, we first formulate a novel problem of fair knowledge distillation for GNN-based teacher-student frameworks. Then we propose a principled framework named RELIANT to mitigate the bias exhibited by the student model. Notably, the design of RELIANT is decoupled from any specific teacher and student model structures, and thus can be easily adapted to various GNN-based KD frameworks. We perform extensive experiments on multiple real-world datasets, which corroborates that RELIANT achieves less biased GNN knowledge distillation while maintaining high prediction utility.
translated by 谷歌翻译
We propose a distributionally robust return-risk model for Markov decision processes (MDPs) under risk and reward ambiguity. The proposed model optimizes the weighted average of mean and percentile performances, and it covers the distributionally robust MDPs and the distributionally robust chance-constrained MDPs (both under reward ambiguity) as special cases. By considering that the unknown reward distribution lies in a Wasserstein ambiguity set, we derive the tractable reformulation for our model. In particular, we show that that the return-risk model can also account for risk from uncertain transition kernel when one only seeks deterministic policies, and that a distributionally robust MDP under the percentile criterion can be reformulated as its nominal counterpart at an adjusted risk level. A scalable first-order algorithm is designed to solve large-scale problems, and we demonstrate the advantages of our proposed model and algorithm through numerical experiments.
translated by 谷歌翻译
Modern deep neural networks have achieved superhuman performance in tasks from image classification to game play. Surprisingly, these various complex systems with massive amounts of parameters exhibit the same remarkable structural properties in their last-layer features and classifiers across canonical datasets. This phenomenon is known as "Neural Collapse," and it was discovered empirically by Papyan et al. \cite{Papyan20}. Recent papers have theoretically shown the global solutions to the training network problem under a simplified "unconstrained feature model" exhibiting this phenomenon. We take a step further and prove the Neural Collapse occurrence for deep linear network for the popular mean squared error (MSE) and cross entropy (CE) loss. Furthermore, we extend our research to imbalanced data for MSE loss and present the first geometric analysis for Neural Collapse under this setting.
translated by 谷歌翻译
Machine Reading Comprehension has become one of the most advanced and popular research topics in the fields of Natural Language Processing in recent years. The classification of answerability questions is a relatively significant sub-task in machine reading comprehension; however, there haven't been many studies. Retro-Reader is one of the studies that has solved this problem effectively. However, the encoders of most traditional machine reading comprehension models in general and Retro-Reader, in particular, have not been able to exploit the contextual semantic information of the context completely. Inspired by SemBERT, we use semantic role labels from the SRL task to add semantics to pre-trained language models such as mBERT, XLM-R, PhoBERT. This experiment was conducted to compare the influence of semantics on the classification of answerability for the Vietnamese machine reading comprehension. Additionally, we hope this experiment will enhance the encoder for the Retro-Reader model's Sketchy Reading Module. The improved Retro-Reader model's encoder with semantics was first applied to the Vietnamese Machine Reading Comprehension task and obtained positive results.
translated by 谷歌翻译